Jaccard similarity is a measure of similarity between two documents. Given two documents, doc1 and doc2, the Jaccard similarity uses on the bag of words in each (say words1 and words2), and then calculates
| words1 ∩ words2 | / | words1 ∪ words2 |
That is the number of distinct words in both documents divided by the number of distinct words in the union. The Jaccard similarity is used heavily in document retrieval algorithms.
| words1 ∩ words2 | / | words1 ∪ words2 |
That is the number of distinct words in both documents divided by the number of distinct words in the union. The Jaccard similarity is used heavily in document retrieval algorithms.